智能论文笔记

Scalable Pathogen Detection from Next Generation DNA Sequencing with Deep Learning

Sai Narayanan , Sathyanarayanan N. Aakur , Priyadharsini Ramamurthy , Arunkumar Bagavathi , Vishalini Ramnath , Akhilesh Ramachandran

分类：机器学习

2022-11-30

Next-generation sequencing technologies have enhanced the scope of Internet-of-Things (IoT) to include genomics for personalized medicine through the increased availability of an abundance of genome data collected from heterogeneous sources at a reduced cost. Given the sheer magnitude of the collected data and the significant challenges offered by the presence of highly similar genomic structure across species, there is a need for robust, scalable analysis platforms to extract actionable knowledge such as the presence of potentially zoonotic pathogens. The emergence of zoonotic diseases from novel pathogens, such as the influenza virus in 1918 and SARS-CoV-2 in 2019 that can jump species barriers and lead to pandemic underscores the need for scalable metagenome analysis. In this work, we propose MG2Vec, a deep learning-based solution that uses the transformer network as its backbone, to learn robust features from raw metagenome sequences for downstream biomedical tasks such as targeted and generalized pathogen detection. Extensive experiments on four increasingly challenging, yet realistic diagnostic settings, show that the proposed approach can help detect pathogens from uncurated, real-world clinical samples with minimal human supervision in the form of labels. Further, we demonstrate that the learned representations can generalize to completely unrelated pathogens across diseases and species for large-scale metagenome analysis. We provide a comprehensive evaluation of a novel representation learning framework for metagenome-based disease diagnostics with deep learning and provide a way forward for extracting and using robust vector representations from low-cost next generation sequencing to develop generalizable diagnostic tools.

translated by 谷歌翻译

道路建设项目维护运输基础设施。这些项目的范围从短期（例如，重新铺面或固定坑洼）到长期（例如，添加肩膀或建造桥梁）。传统上，确定下一个建设项目是什么以及安排什么何时进行安排，这是通过人类使用特殊设备的检查来完成的。这种方法是昂贵且难以扩展的。另一种选择是使用计算方法来整合和分析多种过去和现在的时空数据以预测未来道路构建的位置和时间。本文报告了这种方法，该方法使用基于深神经网络的模型来预测未来的结构。我们的模型在由构造，天气，地图和道路网络数据组成的异质数据集上应用卷积和经常性组件。我们还报告了如何通过构建一个名为“美国建设”的大型数据集来解决我们如何解决足够的公开数据，其中包括620万个道路构造案例，并通过各种时空属性和路线网络功能增强，收集了。在2016年至2021年之间的连续美国（美国）中。使用对美国几个主要城市进行广泛的实验，我们显示了工作在准确预测未来建筑时的适用性 - 平均F1得分为0.85，准确性为82.2％ - 这是52.2％ - 胜过基线。此外，我们展示了我们的培训管道如何解决数据的空间稀疏性。

translated by 谷歌翻译

经验重播方法是加固学习（RL）算法的重要组成部分，旨在减轻伪造的相关性和偏见，同时从时间依赖的数据中学习。粗略地说，这些方法使我们能够从大型缓冲液中绘制批处理的数据，从而使这些时间相关性不会妨碍下降算法的性能。在这项实验工作中，我们考虑了最近开发和理论上严格的反向经验重播（RER），该重播已被证明可以消除简化的理论环境中的这种虚假偏见。我们将RER与乐观的经验重播（OER）相结合，以获得RER ++，在神经功能近似下这是稳定的。我们通过实验表明，在各种任务上的优先体验重播（PER）等技术的性能要比计算复杂性明显较小，具有更好的性能。在RL文献中众所周知，选择最大的TD误差（如OER）或形成具有连续数据点（如RER）的迷你批次而贪婪地选择示例。但是，结合这些技术的方法效果很好。

translated by 谷歌翻译

基于Web的交互可以经常由归因图表示，并且在这些图中的节点聚类最近受到了很多关注。多次努力已成功应用图形卷积网络（GCN），但由于GCNS已被显示出遭受过平滑问题的GCNS的精度一些限制。虽然其他方法（特别是基于拉普拉斯平滑的方法）已经报告了更好的准确性，但所有工作的基本限制都是缺乏可扩展性。本文通过将LAPLACIAN平滑与广义的PageRank相同，并将随机步行基于算法应用为可伸缩图滤波器来解决这一打开问题。这构成了我们可扩展的深度聚类算法RWSL的基础，其中通过自我监督的迷你批量培训机制，我们同时优化了一个深度神经网络，用于采样集群分配分配和AutoEncoder，用于群集导向的嵌入。使用6个现实世界数据集和6个聚类指标，我们表明RWSL实现了几个最近基线的结果。最值得注意的是，我们显示与所有其他深度聚类框架不同的RWSL可以继续以超过一百万个节点的图形扩展，即句柄。我们还演示了RWSL如何在仅使用单个GPU的18亿边缘的图表上执行节点聚类。

translated by 谷歌翻译